Semi-supervised Distributed Clustering with Mahalanobis Distance Metric Learning

نویسندگان

  • Martin Yuecheng Yu
  • Jiandong Wang
  • Guansheng Zheng
  • Bin Gu
چکیده

Semi-supervised clustering uses a small amount of supervised information to aid unsupervised learning. As one of the semi-supervised clustering methods, metric learning has been widely used to clustering the centralized data points. However, there are many distributed data points, which cannot be centralized for the various reasons. Based on MPCK-MEANS framework [1] , the method of distributed Mahalanobis distance learning, called Distributed MPCK-MEANS, is proposed. Similar to MPCK-MEANS, Distributed MPCK-MEANS adopts EM iteration framework to performance clustering and metric learning simultaneously. To decrease the communication costs and protect the data security and privacy, corresponding transmission parameters are designed; furthermore, the transmission techniques are realized too. The proposed method is different to Parallel K-means, but is similar to Parallel K-means with some degrees. The proposed method is tested on artificial data sets and real data sets. Compared with Parallel K-means, Distributed MPCK-MEANS can effectively improve the quality of clustering.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Composite Kernel Optimization in Semi-Supervised Metric

Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...

متن کامل

Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering

Learning distance functions with side information plays a key role in many machine learning and data mining applications. Conventional approaches often assume a Mahalanobis distance function. These approaches are limited in two aspects: (i) they are computationally expensive (even infeasible) for high dimensional data because the size of the metric is in the square of dimensionality; (ii) they ...

متن کامل

Semi-Supervised Metric Learning Using Pairwise Constraints

Distance metric has an important role in many machine learning algorithms. Recently, metric learning for semi-supervised algorithms has received much attention. For semi-supervised clustering, usually a set of pairwise similarity and dissimilarity constraints is provided as supervisory information. Until now, various metric learning methods utilizing pairwise constraints have been proposed. The...

متن کامل

An investigation on scaling parameter and distance metrics in semi-supervised Fuzzy c-means

The scaling parameter α helps maintain a balance between supervised and unsupervised learning in semi-supervised Fuzzy c-Means (ssFCM). In this study, we investigated the effects of different α values, 0.1, 0.5, 1 and 10 in Pedrycz and Waletsky’s ssFCM with various amounts of labelled data, 10%, 20%, 30%, 40%, 50% and 60% and three distance metrics, Euclidean, Mahalanobis and kernel-based on th...

متن کامل

Information-theoretic Semi-supervised Metric Learning via Entropy Regularization

We propose a general information-theoretic approach to semi-supervised metric learning called SERAPH (SEmi-supervised metRic leArning Paradigm with Hypersparsity) that does not rely on the manifold assumption. Given the probability parameterized by a Mahalanobis distance, we maximize its entropy on labeled data and minimize its entropy on unlabeled data following entropy regularization. For met...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JDCTA

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2010